123 research outputs found
Providing Diversity in K-Nearest Neighbor Query Results
Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN)
queries return the K closest answers according to given distance metric in the
database with respect to Q. In this scenario, it is possible that a majority of
the answers may be very similar to some other, especially when the data has
clusters. For a variety of applications, such homogeneous result sets may not
add value to the user. In this paper, we consider the problem of providing
diversity in the results of KNN queries, that is, to produce the closest result
set such that each answer is sufficiently different from the rest. We first
propose a user-tunable definition of diversity, and then present an algorithm,
called MOTLEY, for producing a diverse result set as per this definition.
Through a detailed experimental evaluation on real and synthetic data, we show
that MOTLEY can produce diverse result sets by reading only a small fraction of
the tuples in the database. Further, it imposes no additional overhead on the
evaluation of traditional KNN queries, thereby providing a seamless interface
between diversity and distance.Comment: 20 pages, 11 figure
Recommended from our members
Understanding Open Defecation in the Age of Swachh Bharat Abhiyan: Agency, Accountability, and Anger in Rural Bihar.
Swachh Bharat Abhiyan, India's flagship sanitation intervention, set out to end open defecation by October 2019. While the program improved toilet coverage nationally, large regional disparities in construction and use remain. Our study used ethnographic methods to explore perspectives on open defecation and latrine use, and the socio-economic and political reasons for these perspectives, in rural Bihar. We draw on insights from social epidemiology and political ecology to explore the structural determinants of latrine ownership and use. Though researchers have often pointed to rural residents' preference for open defecation, we found that people were aware of its many risks. We also found that (i) while sanitation research and "behavior change" campaigns often conflate the reluctance to adopt latrines with a preference for open defecation, this is an erroneous conflation; (ii) a subsidy can help (some) households to construct latrines but the amount of the subsidy and the manner of its disbursement are key to its usefulness; and (iii) widespread resentment towards what many rural residents view as a development bias against rural areas reinforces distrust towards the government overall and its Swachh Bharat Abhiyan-funded latrines in particular. These social-structural explanations for the slow uptake of sanitation in rural Bihar (and potentially elsewhere) deserve more attention in sanitation research and promotion efforts
Face Cartoonisation For Various Poses Using StyleGAN
This paper presents an innovative approach to achieve face cartoonisation
while preserving the original identity and accommodating various poses. Unlike
previous methods in this field that relied on conditional-GANs, which posed
challenges related to dataset requirements and pose training, our approach
leverages the expressive latent space of StyleGAN. We achieve this by
introducing an encoder that captures both pose and identity information from
images and generates a corresponding embedding within the StyleGAN latent
space. By subsequently passing this embedding through a pre-trained generator,
we obtain the desired cartoonised output. While many other approaches based on
StyleGAN necessitate a dedicated and fine-tuned StyleGAN model, our method
stands out by utilizing an already-trained StyleGAN designed to produce
realistic facial images. We show by extensive experimentation how our encoder
adapts the StyleGAN output to better preserve identity when the objective is
cartoonisation
Self-consistency for open-ended generations
In this paper, we present a novel approach for improving the quality and
consistency of generated outputs from large-scale pre-trained language models
(LLMs). Self-consistency has emerged as an effective approach for prompts with
fixed answers, selecting the answer with the highest number of votes. In this
paper, we introduce a generalized framework for self-consistency that extends
its applicability beyond problems that have fixed-answer answers. Through
extensive simulations, we demonstrate that our approach consistently recovers
the optimal or near-optimal generation from a set of candidates. We also
propose lightweight parameter-free similarity functions that show significant
and consistent improvements across code generation, autoformalization, and
summarization tasks, even without access to token log probabilities. Our method
incurs minimal computational overhead, requiring no auxiliary reranker models
or modifications to the existing model
Relation between Blood Lead Levels and Childhood Anemia in India
Lead pollution is a substantial problem in developing countries such as India. The US Centers for Disease Control and Prevention has defined an elevated blood lead level in children as ≥10 μg/dl, on the basis of neurologic toxicity. The US Environmental Protection Agency suggests a threshold lead level of 20-40 μg/dl for risk of childhood anemia, but there is little information relating lead levels <40 μg/dl to anemia. Therefore, the authors examined the association between lead levels as low as 10 μg/dl and anemia in Indian children under 3 years of age. Anemia was divided into categories of mild (hemoglobin level 10-10.9 g/dl), moderate (hemoglobin level 8-9.9 g/dl), and severe (hemoglobin level <8 g/dl). Lead levels <10 μg/dl were detected in 568 children (53%), whereas 413 (38%) had lead levels ≥10-19.9 μg/dl and 97 (9%) had levels ≥20 μg/dl. After adjustment for child's age, duration of breastfeeding, standard of living, parent's education, father's occupation, maternal anemia, and number of children in the immediate family, children with lead levels ≥10 μg/dl were 1.3 (95% confidence interval: 1.0, 1.7) times as likely to have moderate anemia as children with lead levels <10 μg/dl. Similarly, the odds ratio for severe anemia was 1.7 (95% confidence interval: 1.1, 2.6). Health agencies in India should note the association of elevated blood lead levels with anemia and make further efforts to curb lead pollution and childhood anemi
- …